UNCORRECTED DRAFT . For the final version , see Automated Building of

نویسنده

  • Marcin Miłkowski
چکیده

For most languages, including Polish, big error corpora are lacking. Traditional error corpora are collected and annotated by linguists, and the process is manual or only slightly automated. The task is therefore tedious and costly, and the results represent linguists’ knowledge about correct usage. This requires additional work to avoid theory-laden distortion of data. In this paper, I will show how to automatically develop error corpora by using revision histories of documents. The idea is based on a hypothesis that most frequent minor edits in documents represent corrections of typos, slips of the tongue, grammar, usage and style mistakes. This hypothesis has been confirmed by frequency analysis of the revision history of articles in the Polish Wikipedia. Partial results of the analysis and perspectives for integrating the error corpus with the Polish National Corpus will be presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Face off Discourses in the Islamic Republic of Iran Constitution Council (Discourse of Jurist Islam versus the Liberal Islam)

The present study aimed to examine theposition of "Jurist Islam" discourses and "Liberal Islam"in the "Final Review of the Islamic Republic of Iran Constitution" in1979. And consider the conflict between the above-mentioned constitutionaldiscourses to gain a better understanding of these discourses and the constitution.To this aim, should find the point of the deba...

متن کامل

Guideline on good pharmacovigilance practices (GVP) Product- or Population-Specific Considerations I: Vaccines for prophylaxis against infectious diseases

Revised draft agreed by ERMS FG 11 November 2013 Revised draft adopted by Executive Director as final 9 December 2013 Date for coming into effect after finalisation 13 December 2013 6 This track-change version identifies the majority of changes introduced to the public consultation version of this document as the Agency’s response to the comments received from the public consultation. This trac...

متن کامل

Neural blackboard architectures of combinatorial structures in cognition

Below is the unedited, uncorrected final draft of a BBS target article that has been accepted for publication. This preprint has been prepared for potential commentators who wish to nominate themselves for formal commentary invitation. Please DO NOT write a commentary until you receive a formal invitation. If you are invited to submit a commentary, a copyedited, corrected version of this paper ...

متن کامل

IST Project IST-2000-29243 OntoWeb OntoWeb: Ontology-based Information Exchange for Knowledge Management and Electronic Commerce D21 Successful Scenarios for Ontology-based Applications v1.0 OntoWeb Ontology-based information exchange for knowledge management and electronic commerce

for dissemination) This deliverable presents a first version of a series of 5 documents, to be delivered each 6 months and whose target is to give guidelines for the application of Knowledge IT to practitioners in the field of in large E-Commerce and Knowledge Management. This series of documents is primarily aimed at anyone who is involved in the process of designing, building and managing Kno...

متن کامل

To: Academic Senate Committees

We are sending you a draft version of the current list of recommendations of the ADVANCE Policy and Practices Review Initiative Committee (PPRI). This is a true draft. We will submit the draft to the campus community for comment and as an initial ideas sweep to see if there are additional recommendations that should be included in this document. The final version of this document will be includ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009